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Transformational  Generative  Grammar 


The  Theory  of  Transformational  Grammar  was  introduced  by  Noam  Chomsky  in 
Syntactic  Structures  [Chomsky  1957]  and  revised  in  Aspects  of  the  Theory  of  Syntax 
[Chomsky  1965].  The  revised  model  is  commonly  referred  to  as  the  Standard  Theory  of 
Transformational  Grammar,  and  has  been  extremely  influential  in  the  field  of  Linguistics. 
This  paper  explains  the  Theory  of  Transformational  Grammar,  surveys  current  work  in  the 
field,  and  identifies  areas  for  further  research. 

Transformational  grammars  derived  from  phrase  structure  grammars,  generalizing 
the  notion  of  rewrite  rules  in  order  to  handle  the  problem  of  discontinuous  dependencies  in 
a  precise,  uniform  manner.  A  phrase  structure  grammar  generates  tree  diagrams  by  a  series 
of  rewrite  rules  which  indicate  what  lexical  categories  (parts  of  speech)  make  up  larger 
categories.  A  transformational  grammar  adds  transformational  rules  that  operate  on  trees 
originally  built  by  phrase  structure  rules  to  generate  new  trees.  The  important  difference  is 
that  phrase  structure  grammars  treat  rules  as  constraints  on  the  structure  that  can  be 
assigned  to  a  sentence;  whereas,  transformational  grammars  allow  the  application  of  new 
kinds  of  rules  which  transform  structures  in  the  course  of  a  derivation,  thereby  creating 
new  structures.  An  example  of  a  transformational  rule  would  be  (from  [Akmajian  and 
Henry;  1975]): 


To  form  a  yes/no  question,  take  a  declarative  sentence  (statement)  and  move  the 
first  auxiliary  to  the  left  of  the  subject  NP. 

The  generative  paradigm  models  the  derivation  process  of  human  language, 
succinctly  capturing  the  generalized  knowledge  and  rules  humans  use  to  create  and 
understand  utterances.  Phrase  structure  analysis,  on  the  other  hand,  is  superficial  in  that  it 
is  determined  solely  by  the  ordering  of  elements  within  a  sentence.  This  superficiality 
becomes  evident  when  examining  sentences  which  have  similar  syntactic  structure.  A ' 
person's  intuitions  about  the  similarity  between  sentences  is  not  based  solely  on  ordering. 
Winograd  argues  that  this  underlying  similarity  is  recognized  by  the  reader  based  on1 
syntactic  structure  rather  than  on  meaning.  [Winograd;  1983]  If  this  is  the  case,  then  an. 
analysis  based  on  ordering  alone  is  not  sufficient.  By_ _ _ 
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Phrase  Structure  Grammar 

A  phrase  structure  grammar  is  a  4-tuple  (V,T,P,S)  where  V  and  T  are  disjoint  finite 
sets,  called  the  set  of  variables  (or  nonterminals)  and  terminals,  respectively.  In  a  grammar 
describing  a  human  language,  for  example,  terminals  are  words;  the  set  of  nonterminals 
might  include  symbols  such  as  N  and  NP,  representing  a  noun  and  a  noun  phrase, 
respectively.  The  symbol  P  represents  a  set  of  productions  (rewrite  rules)  of  the  form 
a— The  symbol  S  is  a  special  symbol  called  the  start  symbol.  Basic  phrase  markers 
(tree  diagrams)  are  generated  by  rewrite  rules  of  the  sort: 

(la)  S  — »  NP  (Aux)  VP 

(lb)  NP  — >  N 

(lc)  VP  — >  V  (S) 

(ld)  VP  ->  V  (NP) 

Optional  constituents  are  shown  in  parenthesis.  Rule  (lc)  can  be  interpreted  as  VP 
(verb  phrase)  consists  of  V  (verb)  followed  by  an  optional  S  (sentence).  Note  the 
recursive  nature  of  (la)  and  (lc).  The  PS  (phrase  structure)  rule  for  (la)  expands  to 
include  a  VP;  rule  (lc),  in  turn,  may  include  S  as  a  constituent.  Thus,  an  infinite  set  of 
structures  can  be  generated  from  a  finite  set  of  rules,  as  in  Figure  1. 


NP  Aux 

"  I  f 1YV. 

Jack  may  think  Jill  will  say 

Figure  1.  An  illustration  of  sentence  embeddin 


To  be  concise,  we  may  abbreviate  rules  (lc)  and  (Id)  as  follows: 

(2)  VP  — >  V  ({S  NP}) 

The  notation  used  in  rule  (2)  means  that  a  verb  phrase  is  composed  of  a  verb  which 
may  be  followed  by  either  a  sentence  or  a  noun  phrase.  So,  parentheses  indicate  an  option, 
as  before,  ana  braces  indicate  alternative  choices.  Let's  investigate  one  of  the  problems 
with  our  PS  grammar  as  described  thus  far.  Our  grammar  does  not  allow  us  to  make  any 
distinctions  about  transitive  verbs  (those  that  require  objects)  and  intransitive  verbs  (those 
that  may  not  occur  with  an  object).  (See  Figure  2.) 
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Jill  admires 
Figure  2.  Ungi 


Jack  disappeared  J  i 
imatical  sentences  allowed  by  our  grammar. 


The  verb  admires  is  transitive;  hence,  the  sentence  Jill  admires  is  ungrammatical, 
while  sentences  like  Jill  admires  Jack  are  good.  Also,  the  verb  disappear  is  intransitive; 
therefore,  sentences  like  Jack  disappeared  Jill  are  ungrammatical,  while  sentences  like  Jack 
disappeared  are  acceptable.  One  solution  to  this  problem  is  to  assume  our  grammar  has 
access  to  a  lexicon  in  addition  to  a  set  of  rules  for  building  trees.  Sample  entries  in  our 
lexicon  would  be: 


admire 

+V 


disappear 


L+L-  NPJ]  L  +I— I 

Figure  3.  Sample  lexicon  entries  for  a  PS 


+V 

_  +[ _ J_ 


immar. 


The  entries  indicate  a  syntactic  context  for  insertion  into  phrase  markers.  The 
syntactic  category  of  both  entries  is  verb.  Contextual  features  are  also  indicated:  the  verb 

admires  has  the  feature  +[ _ NP]  which  means  a  NP  must  follow  it  ,  whereas  the  verb 

disappear  has  the  feature  +[ _ ]  which  means  no  NP  may  follow  it.  Other  possibilities 

for  syntactic  features  include:  [1st  person],  [3rd  person],  [+ plural],  and  [- plural].  A  "+" 
indicates  that  a  word  has  that  feature,  and  a  indicates  that  a  word  is  without  the  feature, 
for  example,  [+N,  +common]  indicates  a  common  noun,  [+N,  -common]  indicates  a 
proper  noun,  and  [3rd  person,  -plural]  indicates  third  person  singular.  Despite  this 
convenient  notation,  phrase  structure  grammars  still  have  several  problems,  described  in 
the  following  section. 


Inadequacy  of  Phrase  Structure  Grammars 

There  are  three  basic  problems  with  phrase  structure  grammars  in  representing  all 
the  significant  aspects  of  language  structure,  as  discussed  in  [Barr  and  Feigenbaum;  1981]: 
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1.  PS  grammars  make  the  description  of  English  unnecessarily  clumsy  and  complex  -  for 
example,  in  the  treatment  of  conjunction,  auxiliary  verbs,  and  passive  sentences.  The 
important  point  here  is  that  PS  grammars  fail  to  capture  linguistically  significant 
generalizations  about  the  English  language.  For  instance,  the  fairly  complicated  rule  (4)  is 
required  to  represent  the  simple  generalization  in  (3).  This  is  due,  in  part,  to  the  special 
way  in  which  the  verbs  'have'  and  'be'  must  be  handled. 


(3)  In  questions,  the  auxiliary  verbs  appear  in  the  same  relative  order  as  in 
declarative  sentences,  but  the  first  auxiliary  verb  occurs  to  the  left  of  the  subject. 
[Akmajian  and  Heny;  1975]. 


NP  Aux 

Modal  NP  (HAVE)  (BE) 
HAVE  NP  (BE) 

BE  NP 


2.  PS  grammars  assign  identical  phrase  markers  to  sentences  that  have  unique  meanings,  as 
in  Figure  4.  The  difference  in  meaning  can  be  attributed  to  a  difference  in  underlying 
syntactic  structure,  as  we  will  see  later  in  this  paper. 


jacK  is  easy  to  satisfy 
eager 

Figure  4.  The  same  phrase  marker  is  assigned  to  two  sentences 
with  different  meanings. 


3.  PS  grammars  provide  no  basis  for  identifying  as  similar  the  sentences  that  have 
different  surface  structures  but  much  of  their  "meaning"  in  common. 


*  tV  *V*V' 


►  vv 

AV.  ' 


•  ■ » *  •  * 


*  v  •.  _ 


-\vN 


.•wv 


A  porcupine  nibbled  that  elm. 

That  elm  was  nibbled  by  a  porcupine. 


My  car  isn't  working. 

The  automobile  that  belongs  to  me  is  out  of  order. 


Which  elm  did  the  porcupine  nibble? 

...the  elm  which  the  porcupine  nibbled... _ 

Figure  5.  Different  forms  with  underlying  similarity,  from 
[Winograd;  1983]. 


Making  the  connection  between  the  above  sentences  relies  on  understanding  the 
underlying  similarity  of  structure.  Although  the  first  pair  of  sentences  are  paraphrases,  as 
are  the  second  pair,  it  is  understood  by  the  reader  that  the  first  pair  is  more  closely  related 
than  the  second.  Furthermore,  another  close  connection  is  intuitively  made  between  the 
sentences  in  the  last  pair,  although  these  two  sentences  have  entirely  different  functions  — 
one  is  a  question,  and  the  other  is  a  phrase  referring  to  an  object. 


Motivation  for  Transformational  Grammars 


The  examples  discussed  thus  far  suggest  that  some  properties  of  sentences  in 
natural  language  cannot  be  accounted  for  by  single  phrase  markers  alone,  that  is,  in  terms 
of  relations  between  immediate  words  in  a  sentence  that  are  connected  in  some  sense  but 
which,  nevertheless,  are  not  contiguous  in  the  linear  ordering  of  the  words. 


One  way  to  account  for  discontinuous  dependencies  of  this  kind  is  to  come  up  with 
a  way  by  which  two  or  more  phrase  markers  can  themselves  be  related  to  each  other  in 
some  specified  way.  Akmajian  points  out  that  this  is  the  fundamental  insight  of  the  Theory 
of  Transformational  Grammar.  We  are  now  ready  to  explore  the  Theory  of 
Transformational  Generative  Grammars. 
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Syntactic  Component 


r  Surface  ^ 
Structure 


transformational 

component 


deep 

structure 
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Figure  6.  The  transformational  model,  from  [Winograd;  1983J. 


In  the  standard  transformational  model,  depicted  in  Figure  6,  the  base  component 
is  a  CFG  (context-free  grammar)  that  generates  deep  structures.  The  deep  structure 
contains  everything  relevant  to  its  meaning,  for  example,  the  deep  structures  for  the  surface 
structure  shown  earlier  in  Figure  4  would  look  something  like  those  in  Figure  7.  The 
triangles  indicate  a  portion  of  the  tree  which  has  been  left  unspecified,  since  it  is  not 
relevant  to  the  example  at  hand. 


A 

/  NP 

/  A 


/  Nf  / A  /  /  /  /  /,r\ 

\  V  A  v  Adj  NP  /  /  NP  /  NP 

.  .  ,  f\  Aux  Adj  A  V  A 

-  satisfy  jack  be  easy  *—*  » 

Jack  be  eager  Jack  satisfy  -X- 

Figure  7.  Deep  structures  which  are  transformed  into  similar  surface  structures. 


-X-  satisfy  jack  be  easy 


The  second  component  is  the  transformational  component,  consisting  of  a  set  of 
transformational  rules  that  operate  on  phrase  markers.  The  transformational  component  is 
used  in  a  derivation  process  by  which  a  deep  structure  is  converted  to  a  surface  structure, 
which  can  then  be  used  to  produce  an  actual  sequence  of  sounds  or  words  in  a  sentence. 
The  deep  structures  of  Figure  7  would  be  converted  by  a  sequence  of  transformations  into 
the  same  surface  structure  (shown  in  Figure  4).  The  difference  in  meaning  between  the 
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two  sentences  can  be  attributed  to  the  difference  in  their  underlying  structures.  Conversely, 
a  single  deep  structure  can  give  rise  to  more  than  one  surface  structure,  depending  on 
which  transformations  the  deep  structure  undergoes. 


The  transformational  model  is  not  a  model  of  how  language  is  produced  but,  ranter, 
a  model  which  formalizes  the  knowledge  a  person  must  have  about  the  syntax  of  a 
particular  language. 

Transformational  Grammar  Explained 

Before  we  define  a  TG  (transformational  grammar)  formally,  several  features  of  TG 
will  be  explained.  There  is  no  standard,  accepted  notation  for  transformational  rules,  but  a 
version  described  by  Akmajian  and  Heny  (1975'  will  serve  our  purpose. 

A  transformational  rule  consists  basically  of  an  SD  (structural  description),  and  an 
SC  (structural  change).  An  SD  is  a  pattern  which  must  be  matched  against  a  tree  in  the 
course  of  a  derivation  in  order  for  the  corresponding  SC  to  take  place.  In  addition  to  an 
SD,  a  rule  may  specify  a  set  of  conditions  which  must  be  met  in  order  for  the  rule  to  fire. 
There  are  three  elementary  transformations  which  can  appear  in  a  structural  change 
description:  deletion,  substitution,  and  adjunction.  Adjunction  can  alternatively  be  sister 
adjunction,  daughter  adjunction,  or  chomsky  adjunction.  To  illustrate  these  concepts,  we 
begin  by  explaining  the  transformational  rule  of  Dative  Movement  given  below  as  rule  (5). 

(5)  Dative  Movement  (optional) 

SD:  V  -  NP  -  (to,  for}  -  NP 
12  3  4 

SC:  1+4  2  0  0 

a.  Mary  gave  a  book  to  the  man. 

b.  Mary  gave  the  man  a  book. 
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Mary  Past  |  giv^  ^  bookj  |t0|  jthemap 
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Figure  8a.  Mary  gave  a  book  to  the  man. 


NP  A|ix 

Tense  V  NP  PP 

I  A  A 

Mary  Past  give  the  man  a  book 

I _ 1  ! _ ! 

1+4  2 

Figure  8b.  Mary  gave  the  man  a  book. 


Dative  movement  applied  to  the  phrase  marker  in  Figure  8a  gives  Figure  8b.  Since  the  SD 
matches  the  subtree  of  Figure  8a  indicated  by  the  numbered  constituents,  we  can  apply  the 
SC.  which  derives  the  phrase  marker  shown  in  Figure  8b.  Again,  the  constituents  are 
numbered  for  ease  of  reference.  The  "+"  in  the  SC  of  Rule  (5)  indicates  sister  adjunction 
is  to  be  performed  on  the  first  and  fourth  constituents.  The  fourth  term  (the  man)  is  to  be 
placed  immediately  to  the  right  of  the  first  term  (give).  The  first  and  fourth  terms  are  now 
sisters  to  each  other  and  are  daughters  of  the  same  VP  node,  as  shown  in  Figure  8b. 

Referring  again  to  the  SC  in  Rule  (5),  we  note  that  the  second  constituent  remains 
as  is.  Continuing,  the  symbol  "0"  in  the  SC  indicates  deletion  is  to  be  performed  for  the 
corresponding  term  in  the  SD,  so  the  third  and  fourth  constituents  are  deleted. 
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To  illustrate  chomsky  adjunction  and  daughter  adjunction  ,  we  will  explain  the 
Passive  transformation. 

(6)  Passive  (optional) 

SD:  NP  -  Aux  -  V  NP 

12  3  4 


SC:  4 


2  >  be+en 


by#  1 


NP  Aux  VP 

A'tV  As. 

[the  Colts  |  pres  have  en  [  beat  |  the  Jets 
12  3  4 

Figure  9a.  The  colts  have  beaten  the  Jets. 


the  Jets 


Tense 


pres  have  en  be  en|  beat  by  the  Colts  | 
2  >  be  +  en  3  +  by  #  1 


_ Figure  9b.  The  Jets  have  been  beaten  by  the  Colts. _ ] 

First,  term  4  is  substituted  for  term  1.  Next,  we  see  by  examining  the  SD  that  the 
second  constituent  of  our  new,  derived  structure  is  to  be  "2  >  be  +  en".  The  symbol  ">" 
indicates  that  "be"  and  "en"  are  to  be  daughter-adjoined  as  the  rightmost  daughters  of 
"Aux"  (term  2).  Similarly,  "<"  indicates  leftmost  daughter-adjunction.  Recall  that  sister 
adjunction  of  node-y  to  node-x  implies  making  node-y  a  daughter  of  whatever  node 
dominates  node-x;  whereas,  daughter-adjunction  of  node-y  to  node-x  merely  means 
adding  a  descendant  to  node-x,  as  shown  in  Figure  10. 


Our  last  elementary  transformation  is  Chomsky-adjunction.  Referring  to  Figure  1 1 , 
Chomsky-adjunction  consists  of  adjoining  a  new  node  with  a  node  that  is  already  there,  in 
this  case  NP,  as  children  of  a  new  parent  node  whose  label  will  be  identical  to  the  label  of 
the  node  that  was  previously  there.  Substituting  this  derived  structure  for  our  original  NP- 
node  completes  the  transformation. 


NP 

NP 

"by"  #  ^ ^ 

the  Colts 

/  NP 

/  A 

by  the  Colts 

Figure  11.  Chomsky-adjunction. 

Ordering  of  Transformations 

It  turns  out  that  individual  transformational  rules  can  interact  with  each  other  to 
derive  complex  surface  structures  in  a  straightforward  way.  Transformations  are  applied  to 
deep  structures  in  a  specific  linear  order,  based  on  dependencies  between  the  rules. 
Furthermore,  if  a  deep  structure  contains  embedded  sentences,  the  entire  collection  of 
transformations  is  applied  in  a  cyclic  fashion,  first  to  the  most  deeply  embedded  sentence  in 
a  tree,  then  to  the  next  highter  sentence  and  so  on.  Referring  to  Figure  12,  all  the  rules  that 
could  be  applied  would  first  apply  to  S3  in  their  proper  order,  then  to  S2,  and  then  to  SI. 
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Some  transformations  are  optional  and  others  are  obligatory.  Originally, 
housekeeping  rules  were  obligatory  and  all  others  were  optional;  however,  later  versions  of 
the  grammar  theory  adopted  a  convention  whereby  all  transformations  are  meaning 
preserving.  In  other  words,  a  deep  structure  should  capture  all  the  meaning  of  a  sentence, 
and  a  transformation  should  not  change  its  meaning.  For  example,  a  transformation  should 
not  turn  a  statement  into  a  question.  In  the  case  of  yes/no  questions,  this  meant  that  the 
very  first  PS  rule  should  be  modified  as  shown  in  Rule  (7). 

(7)  S  ->  (Q)  NP  Aux  VP 

Then  the  Question  transformation  would  be  made  obligatory.  If  the  phrase  marker  Q  is 
present,  the  Question  transformation  will  apply. 

(8)  Question  (Obligatory) 

SD:  Q  -  NP  -  Tense  ({Modal,  HAVE,  BE}) 

1  2  3 

SC:  1  3  2 

We  are  now  ready  to  illustrate  how  two  rules  can  be  applied  in  sequence.  First,  we 
introduce  another  obligatory  transformation,  Affix  Hopping: 

(9)  Affix  Hopping  (Obligatory) 

SD:  Affix  -  V 


SC:  2  #  1 
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Q  NP  Aux 


A  TrVs  1 


Mildred’s  cat  pres  have  en  break  the  Ming  vase 

Figure  13a.  Deep  structure  for  "Mildred's  cat  has  broken  the  Ming  vase". 


pres  have  Mildred's  cat  en  break  the  Ming  vase 

Figure  13b.  After  Question  Transformation. 


/  /\  V  NP 

/  A  I  A 

have-pres  Mildred's  cat  break-en  the  Ming  vase 

Figure  13c.  After  Affix  Hopping  we  get  "Has  Mildreds  cat  broken  the  Ming  vase? 


concise  manner: 


(10a)  Mary  gave  a  book  to  the  man. 

(10b)  Mary  gave  the  man  a  book. 

(10c)  A  book  was  given  to  the  man  by  Mary. 

(lOd)  The  man  was  given  a  book  by  Mary. 

A  complete  ordering  of  all  the  transformational  rules  is  given  in  Appendix  A.  The  proof  for 
ordering  Dative  Movement  before  Passive  is  as  follows  (from  [Akmajian  and  Heny;1975]): 

Proof  by  contradiction: 

Assume  the  contrary;  that  is,  assume  the  rules  must  be  applied  in  the  order:  (1) 
Passive,  (2)  Dative  Movement  As  both  rules  are  optional,  we  have  four  possibilities  for 
generating  the  sentences  given  in  (10).  The  four  possibilities  are: 


(11)  Passive  Applies 
Dative  Applies 


(u) 

Applies 

Doesn't 


(iii) 

Doesn't 

Applies 


(iv) 

Doesn't 

Doesn't 


Clearly,  (iv)  yields  (10a);  our  basic  sentence  results  when  neither  rule  is  applied. 
Option  (iii),  where  only  Dative  Movement  applies,  yields  sentence  (10b). 

Option  (ii),  where  only  Passive  applies,  yields  sentence  (10c). 

Sentence  (lOd)  is  left  to  derive,  with  only  the  option  left  of  applying  both  rules. 
I.  Apply  Passive  first,  then  Dative: 

Mary  past  give  a  book  to  the  man 
NP  Aux  V  NP 

Passive:  A  book  past  be  en  give  to  the  man  by  Mary 
Dative:  Does  not  apply,  since  there  is  no  NP  between  "give"  and  "to". 
=  (10c)  A  book  was  given  to  the  man  by  Mary 

n.  Apply  Dative  first,  then  Passive: 

Mary  past  give  a  book  to  the  man 
V  NP  Prep  NP 

Dative:  Mary  past  give  the  man  a  book 

NP  Aux  V  NP 
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Passive:  The  man  past  be  en  give  by  Mary  a  book 

Affix  Hopping:The  man  be-past  give-en  by  Mary  a  book 
Extraposition:  The  man  be-past  give-en  a  book  by  Mary. 

=  (lOd)  The  man  was  given  a  book  by  Mary. 

Thus,  all  the  sentence  of  (10)  can  be  derived  by  applying  the  rules  (optionally)  in  the  order: 
(1)  Dative  Movement,  (2)  Passive.  Note:  For  Part  II,  additional  transformations  were 
required  to  obtain  the  final  surface  structure. 


Formalization  of  Transformations 

In  a  desire  to  allow  the  application  of  mathematical  techniques  to  transformations, 
Peters  and  Ritchie,  in  their  paper,  "On  the  Generative  Power  of  Transformational 
Grammars" ,  provided  general  definitions  modeling  grammatical  transformations  as 
mappings  on  trees.  These  trees  are  described  in  their  paper  as  labeled  bracketings  . 

The  notation  used  may  appear  a  bit  formidable  to  some  readers;  however, 
formalization  of  the  notions  presented  earlier  in  this  paper  is  the  basis  for  all  further  work  in 
the  field  and  cannot  be  omitted.  Care  is  taken  to  provide  a  basic  overview  without 
obscuring  the  concept.  To  that  end,  the  formalism  described  in  Peters  and  Ritchie  is 
presented  again  here  with  considerably  less  detail  and  with  explanations  added  where 
appropriate.  Definitions  are  repeated  from  Peters  and  Ritchie  (1973)  with  little  or  no 
modification  where  it  is  essential  to  be  precise. 

Let  V-j,  and  Vj^  be  fixed,  disjoint  vocabularies  of  terminals  and  nonterminals, 
respectively.  Let  L={  [A  '  AeVN>  and  R={  ]A  I  Ae  V^}.  Labeled  bracketings 
are  finite  strings  of  symbols  from  V-p  uV^uLuR;  terminal  labeled  bracketings  from 
VjULuR.  A  well-formed  labeled  bracketing  is  one  in  which  the  brackets  occur  in 
(nested)  matched  pairs. 

Definition  1.  A  string  0  is  a  well-formed  labeled  bracketing  if 

i)  <J)  e  V-p  u  V^j 

ii)  4>  =  (pco,  or 

iii)  0  =  [A  <p]A,  AeVN 

where  <p,co  are  well-formed  labeled  bracketings. 
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Peters  and  Ritchie  also  define  a  debracketing  function  mapping  labeled  bracketings  into 
strings  of  terminals  and  nonterminals  as  follows: 

Definition  2.  The  debracketing  function  d  on  labeled  bracketings 
is  the  mapping  defined  by  setting 

d(a)  =  a  if  aeV-puV^ 

=  e  if  as  L  u  R 

This  is  a  simple  homomorphism,  since  d(<j)(p)  =  d(4>)d(<p)  for 
labeled  bracketings  cp,<j>. 

A  few  examples  are  in  order: 

Examples  (l)-(4).  Take  VN  =  {S,  NP,  N,  VP,  V,  ADJ} 

VT  =  {Bruno,  beer,  had,  a,  Pilots,  crashing,  planes,  are} 

then  (l)-(3)  are  well- formed  labeled  bracketings,  and  (4)  is  not: 

(1)  Bruno  had  a  beer 

(2)  [N  [NP  [N  Bruno]N  ]NP  ]N 

(3)  [S  [NP  [N  PiIots]N  ]NP  [VP  are  [NP  [ADJ  crashing]  ADJ 

[N  planes]N  ]NP  ]VP  ]S 

(4)  [V  are]N 

The  debracketization  of  (2)  is  the  terminal  "Bruno".  Although  (2)  is  well-formed,  it  says 
twice  that  "Bruno"  is  a  noun.  Definition  3  eliminates  this  sort  of  redundancy. 

Definition  3.  A  labeled  bracketing  <j>  is  said  to  be  reduced  if  there  are  no  A,  xl,x2> 
cp,co,a,x,  such  that  either  <)>  =  xl  [A] A  %2,  or 

i)  0  =  %1  [A  <p] A  /2, 

ii)  tp  =  a  [A  w]A  x,  and 

iii)  <p  and  co  are  well-formed,  ae  L*  and  xe  R*. 

For  labeled  bracketing  (2)  we  have  xl  =  =  e,  co  =  Bruno,  a  =  [NP,  x  =  ]NP, 

and  A=N,  so  a  reduced  labeled  bracketing  for  (2)  would  be:  [NP  [N  Bruno]N  ]NP. 

We  can  define  the  interior  of  a  terminal  labeled  bracketing  as  the  longest  well- 

formed  substring  of  a  labeled  bracketing  retaining  all  its  terminals.  The  interior  of  (2)  is  (2) 
itself,  while  the  interior  of  (4)  is  "are".  The  left  and  right  exteriors  of  (4)  are  Hj^(4)  =  [V, 

Er(4)  =  ]N. 
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\\r2  =  [VP  are, 

y3  =  [NP  [Adj  crashing]  Adj,  and 
y4  =  [NP  [N  planes]N  ]NP  ]VP  ]S. 

The  standard  factorization  of  (12)  is  (\yl,  \|/2\j/3,  \|/4). 

The  contents  C(0)  of  a  terminal  labeled  bracketing  (defined  iff  <)>  is  a  substring  of  a 
well-formed  labeled  bracketing)  is  the  concatenation  of  the  interiors  of  the  terms  of  a  unique 
factorization  where  each  factor  has  an  interior.  Without  repeating  the  formal  definition 
here,  a  few  examples  will  serve  to  illustrate  the  general  notion.  The  contents  of  \)/2\|;3  is 
"are  [Adj  crashing] Adj",  and  of  Vjr2\j/3ij/4  is  \j/2\|/3[NP  [N  planes]N  ]NP  j  VP.  Basically, 
we  are  identifying  the  longest  well-formed  substring  of  each  individual  factor,  then  we 
concatenate  the  result;  we  extract  everything  that  is  well-formed.  We  also  use  R(0)  to  refer 
to  the  string  of  brackets  remaining  after  C(0)  has  been  removed,  for  example,  R(\j/2\j/3)  is 
[VP[NP,  and  R(\j/2\j/3\jr4)  is  ]S. 

We  are  now  in  a  position  to  define  the  structural  condition  and  transformational 
mapping  for  labeled  bracketing  notation.  The  corresponding  notions  for  phrase  markers 
are  structural  description  and  structural  change.  We  begin  by  describing  the  three  kinds  of 
elementary  transformations  which  are  allowed  in  a  transformational  mapping:  deletion  of  a 
certain  factor,  substitution  of  a  certain  factor  by  a  sequence  of  other  factors,  and  adjoining  a 
sequence  of  factors  to  a  given  factor. 

Definition  5.  The  deletion  elementary  is  the  function  T^  from  substrings  of  well- 
formed  labeled  bracketings  to  labeled  bracketings  defined  by  T(j(4>)  =  R  (0). 

The  substitution  elementary  is  the  function  Tg  from  pairs  (t)),\|/)  of  substrings  of 

well-formed  labeled  bracketings  to  labeled  bracketings  defined  if  and  only  if  0  has 
an  interior  by  setting  Ts(0,\|/)  =  Ej(0)  C(\\f)  Er(0). 

The  left-adjunction  elementary  is  the  function  Tj  from  pairs  (0,\|/)  of  substrings  of 
well-formed  labeled  bracketings  to  labeled  bracketings  defined  iff  0  has  an  interior 
by  setting 

Tj^v)  =  Ej(<(>)  [Al...[Am  C(\|/)I(0)  ]Am...]Al  Er(<|>), 

where  Al,...,Am  is  the  longest  sequence  of  nonterminals  such  that  there  is  a 
well-formed  labeled  bracketing  0)  for  which  1(0)  =  [A  1  ...  [Am  o>  ]Am  ...  ]A1, 


allowing  the  case  m=0  in  which  there  are  no  such  brackets. 

The  right-adjunction  elementary  is  defined  parallel  to  left-adjunction. 

A  structural  condition  on  a  factorization  is  a  Boolean  combination  of  three  kinds  of 
predicates,  where  the  factorization  has  n  terms: 
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h— >i  =n  j-»k. 


i— >J  =n  x, 


where  i-+j  indicates  a  sequence  of  ith-jth  factors, 
and  x  is  a  terminal  string. 

As  an  example,  take  the  predicate  NP9  2->2-  This  means  that  the  second  term  of  a 
factorization  (which  has  9  terms)  must  be  a  noun  phrase  in  order  for  the  structural  condition 
to  apply. 

Without  repeating  the  formal  definition  of  structural  condition  given  by  Peters  and 
Ritchie  (1973),  it  will  become  clear  how  a  transformation  works  using  this  new  notation  by 
carefully  going  through  an  example,  slightly  modified  from  their  presentation.  Consider 
the  sentence:  'By  whom  had  the  call  been  put  through  to  Chicago  before  John  left?"  The 
deep  structure  is: 

Q  wh+Aa  past  have+en  put  through  the  call  to  Chicago  by  be+en  before  John  left, 
where  "Aa"  is  a  dummy  noun  form  indicating  someone. 

We  can  apply  the  Passive  transformation  which  matches  against  the  deep  structure 
as  follows: 
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(2)  wh+A 

(3)  past  have+en 

(4)  put 

(5)  through 

(6)  the  call 

(7)  to  Chicago  by 
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(4)  [VP  fV  PutlV 

(5)  fPrt  throughIPrt 

(6ifNP  fPet  thelDet  fNcalllN  1NP 

(7)  [Dir  to  Chicago]Dir  [manner  [agent  [prep-p  [prep  by]prep 

(8)  fpassive  be  enlpassive  Fprep-p  ]agent  jmanner  ]VP 

(9)  rtime  before  John  leftltime  1PP  ]S. 

The  transformation  Passive  consists  of  a  structural  condition 

NP92_>2*  Aux93_>3,  V94_44,  NP96_>6,  Passive98^g,  and 

and  a  tranfoimational  mapping: 

{  [Ts,  (2,2),(6,6)],  [Tr  (3,3),(8,8)],  [Td  (6,6)],  [Ts  (8,8),(2,2)]  } 
where  Ts  is  the  substitution  elementary, 

Tr  is  the  right-adjunction  elementary,  and 

Td  is  the  deletion  elementary  (refer  to  Definition  5). 

Applying  this  transformation  to  the  factorization  of  the  deep  structure  yields: 

(1)  same  as  factor  1  above 

(2)  [NP  [det  thejdet  [N  call]N  ]NP  from  applying  [Ts  (2, 2), (6,6)] 

(i.e.  replace  contents  of  term  6  with  interior  of  term  2)  See  Definition  4. 

(3)  [PP  [Aux  FAux  ...lAux  [passive  be  en]passive  ]Aux, 
where  Al...Am  =  Aux, 

[PP  is  the  left  exterior  of  factor  3, 

[passive  be  enjpassive  is  the  contents  of  factor  8, 


WWW 


and  rAux.JAux  represents  the  interior  of  factor  3.  See  Definition  5. 

(4)  same 

(5)  same 

(6)  the  contents  of  factor  6  is  6  itself  R(<J>)  =  £ 

(7)  same 

(8)  [NP  wh  [NA]N  ]NP  ]prep-p  Jagent  [manner  ]VP 

(9)  same 

The  seventh  factor  gives  us  an  opportunity  to  make  a  distinction  between  the  interior  and 
contents  of  a  labeled  bracketing.  The  interior  of  factor  7  does  not  exist  since  there  is  no 
well-formed  substring  of  factor  7  containing  all  its  terminals;  whereas,  the  contents  of 
factor  7  does  exist.  The  contents  of  factor  7  is 
[Dir  to  Chicago]Dir  [prep  by]prep, 

extracting  all  well-formed  substrings  and  concatenating  them.  Note  also, 

R(7)  =  [manner  [agent  [prep-p. 

Peters  and  Ritchie  go  on  to  prove  a  very  important  theorem  which  is  restated  as  Theorem  1 . 
This  theorem  identifies  the  equivalence  between  transformational  grammars  and  r.e. 
languages. 

Theorem  1.  Every  recursively  enumerable  language  is  generated  by  some  context- 
sensitive  based  transformational  grammar,  and  conversely. 

Areas  for  Further  Research 

A  number  of  modifications  and  extensions  have  been  proposed  to  the  Standard 
Theory  of  Transformational  Grammar.  Some  of  these  will  be  explored  below. 

Extended  Standard  Theory 

The  Standard  Theory  relies  on  the  Katz-Postal  Hypothesis  (1964)  which  states  that 
transformations  were  meaning  preserving. 

(141  Katz-Postal  Hypothesis 

Transformations  are  meaning-preserving,  in  the  following  sense:  if  two 
surface  structures  derive  from  exactly  the  same  underlying  structure  and  if 
their  derivations  differ  only  in  that  an  optional  transformation  has  applied  in 
one  but  not  the  other,  then  they  must  have  the  same  meaning. 
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Critics  of  the  Standard  Theory,  however,  were  quick  to  point  out  that  meaning  was  affected 
by  the  application  of  some  transformations.  Two  apparent  counterexamples  to  the 
hypothesis  will  suffice,  (from  [Akmajian  and  Henry;  1975]) 


(15a)  John  didn't  leave  the  room,  did  he? 

(15b)  John  left  the  room,  didn't  he? 

(15c)  Q  -  not  -  John  -  past  -  leave  the  room 

(16a)  Few  people  have  read  three  of  Hemingway's  novels. 

(16b)  Three  of  Hemingway's  novels  have  been  read  by  few  people. 

The  first  pair  of  sentences  both  derive  from  the  same  underlying  structure,  namely  (15c). 
They  have  undergone  the  same  series  of  transformational  rules,  (i.e.  Tag  Formation, 
Negative  Placement,  Contraction,  Question,  etc.)  yet  do  not  have  the  same  meaning.  The 
only  difference  is  that  Negative  Placement  has  placed  noi  in  the  main  clause  of  (15a)  and  in 
the  tag  of  (15b).  Sentence  (15a)  supposes  John  has  not  left  the  room  and  expects  the 
answer  "no";  whereas,  sentence  (15b)  supposes  John  has  left  the  room,  and  the  answer  is 
expected  to  be  "yes".  The  second  counterexample  concerns  derivations  where  the  Passive 
transformation  has  been  applied.  Examining  sentence  (16b),  it  is  clear  that  its  meaning  is 
slightly  different  than  that  of  sentence  (16a). 

Examples  such  as  these  have  led  to  the  Extended  Standard  Theory.  In  this  extended 
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model,  the  semantic  interpretation  rules  operate  on  the  entire  set  of  trees  used  in  the 
derivation  instead  of  extracting  meaning  only  from  the  deep  structure. 

Generative  Semantics 

Proponents  of  this  model  made  sweeping  changes  to  the  Standard  Theory  to  handle 
the  problems  described  above  and  other  issues  as  well.  The  basic  idea  is  that  there  is  no 
separate  semantic  interpretation  component;  rather,  both  the  semantic  and  syntactic 
representations  are  imbedded  in  phrase  markers.  The  base  structures,  now,  were  not 
merely  syntactic  but  were  logical  representations.  The  terminal  nodes  of  the  base  structure 
were  no  longer  words  but  semantically  interpretable  terms,  similar  to  symbolic  logic  terms. 
Sentences  (15)  and  (16)  could  now  be  handled  by  the  fact  that  they  would  not  derive  from  a 
single  underlying  structure  but  from  two  different  logical  forms. 
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Montague  Grammar 

The  difference  between  Montague  Grammar  and  the  Standard  Theory  is  in  the 
treatment  of  the  semantic  structure.  Rather  than  using  the  deep  structure  as  input  to  a 
semantic  component  to  produce  meaning,  Montague  Grammar  associates  a  semantic  rule 
with  each  syntactic  rule.  Whenever  a  syntactic  rule  is  applied  to  the  syntactic  structure,  the 
semantic  rule  is  applied  to  the  corresponding  logical  structure.  The  formalism  is  extremely 
complex  involving  intensional  logic  and  will  not  be  described  here. 

Trace  Theory 

A  modification  of  Extended  Standard  Theory,  Trace  Theory  proposed  that  both  the 
phonological  and  semantic  components  operate  only  on  the  surface  structures,  but  the 
surface  structure  would  now  contain  traces  to  capture  the  relevant  information  about 
meaning  from  the  deep  structure.  As  an  example  (from  [Winograd;  1983])  consider  the 
sentences  (17). 

(17a)  This  is  the  oscilloscope  Tom  used  to  fix. 

(17b)  This  is  the  oscilloscope  Tom  used  to  fix  the  radio. 

(17c)  Tom  used  the  oscilloscope  to  fix  the  radio. 

Here  we  see  how  Trace  Theory  can  explain  a  phonological  process.  The  same  idea  extends 
to  semantic  interpretation  of  similar  surface  structures  for  sentences  with  different  meaning. 
In  (17a),  a  reader  contracts  the  phrase  "used  to"  to  "useta";  whereas,  the  same  contraction 
does  not  apply  to  sentence  (17b).  Why?  It  just  so  happens  that  contraction  is  blocked  in 
precisely  those  cases  where  we  might  say  "Something  was  deleted  between  the  two  words 
we  are  trying  to  contract."  Referring  to  the  underlying  structure  in  (17c),  v/e  see  that  the 
phrase  "the  oscilloscope"  occurs  between  "used"  and  "to".  So,  in  Trace  Theory,  whenever 
we  move  a  phrase,  a  trace  is  left  behind.  The  surface  structure  retains  this  trace,  and  the 
contraction  rule  can  be  easily  restated  to  block  contraction  across  a  trace  marker. 

Generalized  Phrase  Structure  Grammar  (GPSG) 

Gerald  Gadzar,  of  the  University  of  Sussex,  offered  an  extended  interpretation  of 
PS  grammars,  adding  the  notions  of  rule  schemata  and  meta-rules  which  greatly  reduce  the 
work  done  by  transformations.  Rule  schemata  are  patterns  of  rules.  They  present  sets  of 
rules,  which  have  some  common  property,  as  a  single  statement.  [Walter;  1986] 
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For  example,  the  rule: 

(18) *  — >  *  "and"  *,  where  *  is  any  category, 

represents 

(19)  NP  — >  VP  "and"  NP 
VP  ->  VP  "and"  VP 

N  — >  N  "and"  N 

A  metarule  creates  new  rules  from  rules  which  already  exist  If  a  grammar  contains  the 
productions 

(20)  VP  — >  V  NP  _ 

VP->V  NP  PP 

VP->V  NP  NP 

VP->V  NP  VP 

then  the  metarule 

(21)  VP  — >  V  NP  W  =>  VP  [pas]  — »  V  W  PP 

creates  the  following  new  rules: 

(22)  VP  [pas]  —>  V _  PP 

->  VE£PP 
VNPPP 
->  V  VPPP 

As  in  Montague  Grammar,  each  syntactic  rule  has  associated  with  it  a  semantic  rule 
(which  operates  in  parallel)  to  create  a  semantic  representation  of  a  sentence. 

Lexical  Functional  Grammar  (LFG) 

An  LFG  consists  of  a  context-free  grammar  and  a  dictionary.  Equations  are 
associated  with  each  production  in  the  grammar  and  with  each  entry  in  the  dictionary.  The 
derivation  process  works  in  three  phases. 

Phase  1.  A  phrase  marker  is  generated.  Then  leaf  nodes  are  assigned  words  from 
the  dictionary.  Next,  all  nodes  (except  those  which  are  assigned  words)  are  marked  with 
unique  variables. 


Phase  2.  Recall  that  each  word  and  each  production  have  an  associated  equation. 
In  the  second  phase,  these  equations  are  instantiated  and  a  functional  description  is 
produced,  which  is  another  set  of  equations. 

Phase  3.  Solving  the  set  of  equations  produces  a  functional  structure.  One  solution 
indicates  a  grammatical  sentence.  Two  solutions  indicate  an  ambiguous  sentence.  No 
solutions  indicate  an  ungrammatical  sentence  (for  details  see  [Winograd;  1983]). 

Where  do  we  go  from  here? 

Robert  Berwick,  in  his  paper  Strong  Generative  CapacitylWeak  Generative 
Capacity  and  Modern  Linguistic  Theories  (1984 ),  gives  a  good  review  of  the  current  state 
of  research  in  Transformational  Grammar  theory.  One  use  of  mathematical  analysis  has 
been  to  diagnose  grammatical  formalisms  as  too  powerful  (allowing  too  many  grammars). 
For  instance,  Peters  and  Ritchie's  demonstration  that  the  theory  of  TG  could  specify  any 
recursively  enumerable  set  was  thought  by  some  to  indicate  that  TG  were  too  powerful. 
Berwick  states,  "A  theory  that  is  too  powerful  could  generate  either  unnatural  tree 
structures  (and  so  be  too  powerful  in  terms  of  strong  generative  capacity)  or  it  could 
generate  unnatural  sentences  (and  be  too  powerful  in  terms  of  weak  generative  capacity)." 

We  want  our  theory  to  describe  all  and  only  the  natural  languages.  A  more 
restricted  formalism  is  desired  if  we  are  to  learn  about  how  humans  actually  derive 
language  (rule  systems  underlying  linguistic  behavior)  or  how  different  languages  interact. 
We  cannot  infer  too  much  from  a  grammar  which  allows  us  to  specify  an  arbitrary  Turing 
Machine  computation  (see  Figure  15),  since  it  is  generally  accepted  that  Turing  Machines 
are  able  to  specify  any  language. 

At  the  other  end  of  the  complexity  hierarchy  (see  Figure  15),  it  has  been  shown  that 
many  reasonable  questions  about  regular  languages  are  solvable.  To  name  a  few, 
membership,  inclusion,  equivalence,  infiniteness,  and  emptiness  have  all  been  shown  to  be 
solvable  for  regular  languages.  Many  similar  questions  about  the  other  classes  of  languages 
are  either  unsettled  or  unsolvable.  Language  theory  concepts  such  as  nondeterminism  and 
the  complexity  hierarchy  depicted  in  Figure  15  allow  us  to  prove  lower  bounds  on  the 
inherent  complexity  of  certain  practical  problems  [Hopcroft  and  Ullman;1979].  We  can 
also  use  Automata  Theory  to  show  that  certain  problems  are  unsolvable,  by  showing  that 
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the  problem  is  equivalent  to  another  one  which  has  already  been  shown  to  be  Turing 
Machine  unsolvable. 

Berwick  demonstrates  that  the  excess  power  in  Chomsky's  Apsects  Theory  comes 
from  unbounded  deletion.  In  fact,  he  claims  that  all  the  proofs  demonstrating  the  power  of 
the  Aspects  Theory  use  this  erasing  pov'er  to  delete  strings  of  arbitrary  length.  Berwick 
shows  that  proofs  given  independently  by  Peters  and  Ritchie  (1973),  Kimball  (1967),  and 
Salomma  (1971)  all  use  unbounded  deletion  in  order  to  demonstrate  that  Transformational 
Grammars  can  generate  any  r.e.  set.  Berwick  contrasts  this  property  of  the  older  theories 
with  modem  Government  Binding  theory,  stating  that  current  theories  allow  only  a  linear 
amount  of  erasing. 

The  trend,  then,  is  toward  a  more  restrictive  formalism.  Research  remains  to  be 
done  in  terms  of  both  strong  and  weak  generative  capacity  of  Transformational  Grammars. 
If  a  particular  theory  generates  too  many  languages,  something  can  be  learned  from  finding 
out  what  the  source  of  its  excess  capacity  is. 
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Type  0.  Unrestriced  Languages,  Turing  Machines, 
recursively  enumerable  languages 


Type  0.  Recursive  Languages 


Type  1 .  CSL’s,  CSG  ^ 


Type  2.  CFL's.CFG.PDA 


Figure  15.  The  Languages  under  £  . 

Note:  A  language  accepted  by  a  TM  is  called  a  recursively  enumerable  set  if 
it  halts  on  certain  inputs  and  may  not  halt  on  others.  A  recursive  set  is 
accepted/generated  by  at  least  one  TM  that  halts  on  all  of  its  inputs. 
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